Prediction of the potential geographical distribution of Betula platyphylla Suk. in China under climate change scenarios

Climate is a dominant factor affecting the potential geographical distribution of species. Understanding the impact of climate change on the potential geographic distribution of species, which is of great significance to the exploitation, utilization, and protection of resources, as well as ecologically sustainable development. Betula platyphylla Suk. is one of the most widely distributed temperate deciduous tree species in East Asia and has important economic and ecological value. Based on 231 species distribution data points of Betula platyphylla Suk. in China and 37 bioclimatic, soil, and topography variables (with correlation coefficients < 0.75), the potential geographical distribution pattern of Betula platyphylla Suk. under Representative Concentration Pathway (RCP) climate change scenarios at present and in the 2050s and 2070s was predicted using the MaxEnt model. We analyzed the main environmental variables affecting the distribution and change of suitable areas and compared the scope and change of suitable areas under different climate scenarios. This study found: (1) At present, the main suitable area for Betula platyphylla Suk. extends from northeastern to southwestern China, with the periphery area showing fragmented distribution. (2) Annual precipitation, precipitation of the warmest quarter, mean temperature of the warmest quarter, annual mean temperature, and precipitation of the driest month are the dominant environmental variables that affect the potential geographical distribution of Betula platyphylla Suk. (3) The suitable area for Betula platyphylla Suk. is expected to expand under global warming scenarios. In recent years, due to the impact of diseases and insect infestation, and environmental damage, the natural Betula platyphylla Suk. forest in China has gradually narrowed. This study accurately predicted the potential geographical distribution of Betula platyphylla Suk. under current and future climate change scenarios, which can provide the scientific basis for the cultivation, management, and sustainable utilization of Betula platyphylla Suk. resources.

a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 [37], and B. platyphylla multi-sanctuaries, multi-directional expansion, heterogeneous genetic models, etc. [29]. However, few scholars have predicted the potential geographical distribution pattern of B. platyphylla and its dominant environmental variables under future climate change scenarios. In this study, we collected and screened the species distribution data of B. platyphylla, based on soil, topography, and other related environmental data, using the Max-Ent model and ArcGIS 10.3 software spatial analysis function to simulate the current potential geographical distribution pattern of B. platyphylla according to the current climate data, explore its main environmental variables, and predict climate data for the 2050s and the 2070s. Then, we assessed the potential geographical distribution pattern of B. platyphylla in China in the future and its response to different climate change scenarios. This study provides a scientific basis for resource investigation and sustainable use of B. platyphylla and can serve as an important reference for future management and cultivation of B. platyphylla forests.

Species occurrence data
Through the Global Biodiversity Information Facility (https://www.gbif.org/), the National Specimen Information Infrastructure (http://www.nsii.org.cn/), and the Herbarium of the Institute of Botany, Chinese Academy of Sciences (http://pe.ibcas.ac.cn/), the species occurrence data of B. platyphylla from 1970 to 2020 were obtained (S1 Table). According to the following principles, reasonable species occurrence records were selected: Firstly, the occurrence records of B. platyphylla collected in this study did not include Betula platyphylla var. mandshurica or Betula platyphylla var. szechuanica, only Betula platyphylla Suk. Secondly, the species occurrence records must have complete longitude and latitude information to ensure geographical accuracy. For some sample occurrence records without geographical coordinates but have other detailed information, the Baidu coordinate picking system was used to obtain the corresponding latitude and longitude coordinates. Thirdly, some species occurrence data are sampled multiple times in different years, in which case only one record is kept. Fourthly, to match the environmental variables with a spatial resolution of 1 km × 1 km, the study area was divided into several 1km 2 grids, and only one sample record was kept in each 1km 2 grid [38]. These operations can greatly reduce the spatial autocorrelation of species occurrence data and effectively reduce the error. Finally, 231 occurrence records should be used for model operations (Fig 1). The 19 bioclimatic variables for the current and future scenarios were downloaded from the WorldClim dataset (http://www.worldclim.org/) [39]. The current climate dataset was generated by interpolation of observed weather data using a thin-plate smoothing spline during the period of 1970-2000 [21]. The Global climate model (GCM) data we use is based on the Coupled Model Intercomparison Project Phase 5 (CMIP5), compared with CMIP6 GCMs, CMIP5 GCMs still have a higher spatial resolution so far. The future climate scenarios were presented by 2050s (the average data for 2040-2060) data and 2070s data (the average data for 2060-2080) modeled by the Community Climate System Model version 4 (CCSM4) representing four future greenhouse gases concentration trajectories (RCP2.6, RCP4.5, RCP6.0, and RCP8.5). CCSM4 is one of the most effective GCMs for predicting the impact of future climate change on the distribution of animal and plant species and has been widely used in previous studies [40,41]. RCP2.6, RCP4.5, RCP6.0, and RCP8.5 respectively represent low concentration, slightly lower concentration, slightly higher concentration, and high concentration greenhouse gas emission scenarios. Under this climate model scenario, by the end of this century (2081-2100), the global average temperature will increase by 0.3-1.7˚C under RCP2.6 emission scenario, 1.1-2.6˚C under RCP4.5 emission scenario, 1.4-3.1˚C under RCP6.0 emission scenario, and 2.6-4.8˚C under RCP8.5 emission scenario [11]. To avoid ignoring the subtle changes in species distribution caused by different climate change scenarios in the future, we chose to include RCP2.6, RCP4.5, RCP6.0, and RCP8.5 into the future potential geographic distribution of B. platyphylla. The data of the three topography variables are from National Tibetan Plateau Data Center (http://data.tpdc.ac.cn) digital elevation model of China. The data of 36 soil variables were obtained from the National Cryosphere Desert Data Center (http:// www.ncdc.ac.cn). We used data from the world soil database established by the Food and Agriculture Organization and the International Institute for Applied Systems Analysis. The data source in China was 1:1,000,000 soil data from the Nanjing Soil Survey of the second national land survey [42]. The spatial resolution of the above environmental variables was 1 km. The vector boundary was obtained from Natural Earth (http://www.naturalearthdata.com/). Based on the principle of national and territorial integrity, we have modified and adjusted the vector boundary. Generally, most studies only select bioclimatic variables and topography variables for modeling, but soil variables are also important factors affecting species distribution. Stanton et al. (2012) have suggested better results could be achieved by combining important static variables with dynamic bioclimatic variables; this will produce better results than excluding static variables [43]. Therefore, in addition to bioclimatic variables and topography variables, we also added soil variables, and in our study, we assumed that soil and topography variables would not change during the simulation of potential geographic distribution in the context of climate change. For complete environmental variables, please refer to (S2 Table).

Environmental variables
The multicollinearity of environment variables will affect the prediction results of the model, resulting in overfitting of model results [44]. Thus, correlation analysis and screening of environmental variables can improve model prediction accuracy. To eliminate the influence of multicollinearity on the model results, we take the following measures: Firstly, 58 environmental variables are tested by the Jackknife test in the MaxEnt model to evaluate the contribution rate of each variable, and the environmental variables with 0 contribution rate are eliminated. Secondly, the environmental variables with a contribution rate > 0 were selected, and the Spearman rank correlation test was conducted on soil environmental variables and climate environmental variables using SPSS ver. 21.0 (IBM Corp., Armonk, NY, USA). Environmental variables with a correlation coefficient < 0.75 were selected. For environmental variables with a correlation coefficient � 0.75, only the environmental variables with the larger contribution rate were retained [45,46]; those with a smaller contribution rate were excluded. Finally, 37 environmental variables were selected for the modeling analysis (Table 1).

MaxEnt modeling
In this study, the MaxEnt 3.4.1 (http://www.cs.princeton.edu/~schapire/maxent/) was selected for the simulation. The processed sample data of B. platyphylla distribution and 37 environmental variables after screening were imported into the MaxEnt model. Our modeling was performed according to the standard protocol for reporting species distribution models by Zurell et al (2020) [47].
The feature parameters were settled as Linear feature, Quadratic feature, Product feature, and Hinge feature, and "Create response curves", "Make pictures of predictions" and "Do jackknife to measure variable importance" were chosen to interpret how individual variables affect the probability of the presence of B. platyphylla. In the basic part, the "Random test percentage" was set as 25, representing 75% of the sample data was randomly selected as the model training set; the remaining 25% of sample data was used as the test set to verify the model. The "Regularization multiplier" was set as 1 to prevent over-complexity and reduce overfitting by controlling the intensity of the chosen feature classes. The "Max number of background points" was set as 10000, the "Replicates" was set as 10. In the advanced part, the "Maximum iterations" was set as 500, the "Convergence threshold" was set as 0.0001. The output format was set as "Cloglog", a previous study has shown that the "Cloglog" output was the optimal output mode for predicting the suitable area [48].
After the model was established, the area under the curve (AUC) of the receiver operating characteristic curve was used to evaluate the model accuracy [49][50][51]. AUC values ranged from 0 to 1, where larger AUC values represent better prediction results. The evaluation criteria were as follows: 0.50-0.60, prediction results fail, no credibility; 0.60-0.70, prediction results are poor and credibility is low; 0.70-0.80, prediction results are general, credibility is general; 0.80-0.90, prediction results are good and relatively reliable; 0.90-1.00, prediction results are very accurate and reliable. According to a previous study, models with AUC > 0.85 are sufficiently accurate to predict the potential geographical distribution of species under climate change scenarios [52].

Importance assessment of environmental variables
Among the output results of the MaxEnt model, the Jackknife method, percentage contribution rate, and permutation importance value can be used to evaluate the importance of environmental variables on the potential geographical distribution of B. platyphylla. The Jackknife method evaluates the importance of each environmental variable to the potential geographical distribution of species by comparing the differences among the output regularized training gain, regularized test gain, and AUC value [53]. According to the inherent algorithm, the coefficient corresponding to the eigenvalue is adjusted to improve the gain value of the model. The gain value is assigned to the environment variable depending on the eigenvalue and is converted into the contribution percentage, which is called the percentage contribution rate [54]. The permutation importance value is used to calculate the change range of the training AUC value through random permutation of the training set and normalize the result; the obtained percentage is the permutation importance value [55].

Division of suitable area and analysis of spatial pattern change
We used ArcGIS 10.3 software to divide and visualize the suitable area for B. platyphylla. Based on the maximum training sensitivity and specificity threshold (0.2932) generated from the MaxEnt model, the suitable area for B. platyphylla was classified [56,57]: < 0.2932, unsuitable area; 0.2932-0.40, less suitable area; 0.40-0.60, moderately suitable area; and > 0.60, highly suitable area.
To more intuitively show the change in a suitable area for B. platyphylla combined with previous studies [58,59], ArcGIS 10.3 software was used to convert the existing probability grid map of B. platyphylla into a binary map according to the threshold value (1 = suitable area, 0 = unsuitable area), overlay the distribution maps of different periods, and obtain the spatial change of B. platyphylla suitable area under the climate change scenarios using the grid calculator tool. In the output results, 0 represented a lack of present or future suitable area in a region, 1 represented a shrinkage of future suitable area, 2 represented the expansion of the future suitable area, and 3 represented stable present and future distribution of suitable area in a region.

Main environmental variables affecting the potential geographical distribution of B. platyphylla
According to the Jackknife method, percentage contribution rate, and permutation importance values of the model output, Bio12, Bio18, Bio10, Bio1, and Bio14 were the main environmental variables affecting the potential geographical distribution of B. platyphylla. According to the response curve of environmental variables to the presence probability in the MaxEnt model (Fig 3), taking the presence probability greater than 0.2932 as the selection condition of   We use ArcGIS 10.3 software to extract the area of different suitable areas ( Table 2). The total suitable area for B. platyphylla was 168.75 × 10 4 km 2 ; less suitable area (52.05 × 10 4 km 2 ), moderately suitable area (55.28 × 10 4 km 2 ), and highly suitable area (61.42 × 10 4 km 2 ) accounted for 5.50%, 5.84%, and 6.49% of the total area, respectively. Among provinces, Sichuan Province had the largest total suitable area, accounting for 12.38% of the total suitable area in China, as well as the largest highly suitable area, accounting for 18.63% of the highly suitable area in China. Shaanxi Province had the largest moderately suitable area, accounting for   5) and changes in distribution (Fig 6) of suitable area for B. platyphylla under different climate scenarios in the 2050s and 2070s were obtained. Mountains. Under all scenarios, the suitable areas in Sichuan, Shaanxi, and Inner Mongolia accounted for more than 30% of the total suitable area. Table 2 also accurately shows the suitable area of B. platyphylla under future climate change scenarios, in the 2050s, under RCP6.0, the highly suitable area for B. platyphylla was 66.14 × 10 4 km 2 ; in the 2070s, RCP2.6 had the greatest range of highly suitable area for B. platyphylla (67.35 × 10 4 km 2 ) among the climate change scenarios.
Under different climate change scenarios, the potential geographical distribution of B. platyphylla in the future was predicted to be relatively stable compared with the present distribution. Fig 6 and Table 3 presents the changes in the potential geographical distribution of B. platyphylla under different climate change scenarios between the present and the 2050s or 2070s. Under RCP2.6, RCP4.5, RCP6.0, and RCP8.5, from the present to the 2050s, the total suitable area for B. platyphylla increased by 2.95%, 5.16%, 4.49%, and 5.18%, respectively; and from the present to the 2070s, the total suitable area for B. platyphylla increased by 4.31%, 3.23%, 4.59%, and 8.56%, respectively. Under RCP2.6 and RCP8.5, the total suitable area in the 2070s was 2.27 × 10 4 km 2 and 5.70 × 10 4 km 2 higher than those in the 2050s, respectively. Under RCP4.5, the total suitable area in the 2070s was 3.24 × 10 4 km 2 lower than that in the 2050s. Under RCP6.0, the total suitable area in the 2070s changed little compared with that in the 2050s. In the future scenarios, the total suitable area for B. platyphylla tended to expand and was most evident in the southern section of the Greater Khingan Mountains, Yinshan Mountains, Changbai Mountains, Daba Mountains, Hengduan Mountains, and the eastern mountainous area of Qinghai Province. Shrinkage was most evident in the north of the Greater Khingan Mountains, the southeast of the Xiaoxing'an Mountains, the north of the Changbai Mountains, and the north of the Qinling Mountains. At present, there is no suitable area on the west side of the Tianshan Mountains, but there is a certain range of suitable areas in the 2050s and 2070s. On the west side of the Tianshan Mountains in Xinjiang, the total suitable area was largest under the 2070s RCP2.6 scenario and was 0.81 × 10 4 km 2 , the total suitable area was the smallest area under the 2070s RCP4.5 scenario, and was 0.47 × 10 4 km 2 .

Discussion
The output results of the MaxEnt model provide good reference values for important ecological issues, such as the prediction of species distribution and the impact of global warming on the suitable areas for species [61,62]. Exploring the present and future potential geographical  (the 2050s and 2070s). (When presence probability is < 0.2932, unsuitable area; When presence probability is 0.2932-0.40, less suitable area; When presence probability is 0.40-0.60, moderately suitable area; And when presence probability is > 0.60, highly suitable area). The boundary was obtained distribution of species is of great significance for the protection, use, and sustainable management of species in the context of global warming [63]. Based on the species distribution data of B. platyphylla, using the MaxEnt model, combined with different present climate change scenarios at present and in the 2050s and 2070s, this study predicted the potential geographical distribution of B. platyphylla under different climatic conditions and analyzed the dynamic changes in the suitable area. This will provide a reference for the cultivation and management of the B. platyphylla forest.
In any period of tree cultivation and growth, temperature and precipitation are always the most important driving factors. Temperature and water availability will affect the physiological activities and biochemical processes of trees. Combined with the prediction results of the model, under both present and future climate scenarios, the cumulative percentage contribution rates and cumulative permutation importance values of bioclimatic variables always exceeded 60%. Compared to soil and topographic variables, bioclimatic variables had the greatest impact on the potential geographical distribution of species. The cumulative contribution rates and cumulative replacement importance values of environmental variables related to precipitation were above 45%. The suitable area of B. platyphylla is mainly distributed in semihumid and semi-arid areas. The vegetation distribution in this area has higher requirements for precipitation. However, due to the influence of latitude and altitude, the influence of temperature on the distribution of suitable areas is weakened. From another point of view, in the season of high temperature, precipitation can effectively alleviate the surface temperature. Thus, among bioclimatic variables, precipitation showed a great impact on the potential geographical distribution pattern of B. platyphylla. Based on the comprehensive analysis of the changes in the potentially suitable area and the most influential environmental variables of B. platyphylla in the 2050s and 2070s under four emissions scenarios, annual precipitation, precipitation of the warmest quarter, annual mean temperature, and mean temperature of the warmest quarter were positively correlated with a suitable area for B. platyphylla; increases in these variables were associated with an increase in the suitable area for B. platyphylla. Meanwhile, precipitation of the driest month was negatively correlated with the suitable area for B. platyphylla; a decrease in precipitation of the driest month was associated with an increase in the suitable area for B. platyphylla. For soil variables, topsoil base saturation、subsoil CEC (clay)、topsoil CEC (clay)、subsoil pH (H 2 O), and other variables have a certain impact on the suitable area of B. platyphylla, mainly because B. platyphylla is more suitable to grow in acidic soil, and soil moisture has different effects on B. platyphylla during different growth stages. If the soil moisture is too high, it will be harmful to B. platyphylla at the seedling stage, but it can effectively promote the growth of B. platyphylla at the growth stage. Different types of soil reflect different degrees of solar radiation, which affects the rate of photosynthesis and ultimately affects the growth of trees. According to the prediction results of the model, it can be seen that the altitude of 400-4500 m and the slope of 35˚are suitable for the growth of B. platyphylla. Generally speaking, topography variable is an important driving factor of soil nutrients and water, so topography variables should be fully considered for the cultivation and planting of B. platyphylla forest.
Our Under the four climate change scenarios, the temperature increased to different degrees compared with the present, although the range of suitable areas for B. platyphylla remained similar. Under the current and future climate change scenarios, the prediction results of the potential geographical distribution of B. platyphylla were greater than the actual distribution. In the Changbai Mountains, Xiaoxing'an Mountains, and Greater Khingan Mountains forest areas in Northeast China, highly suitable areas accounted for 7.65% of the total highly suitable area in China. In this study, Betula platyphylla var. mandshurica was excluded from the research. Previous studies have found B. platyphylla has high genetic diversity, reflecting the genetic variation of B. platyphylla in Northeast China [64]. If the prediction is based on species major category, the results may be more consistent with the current actual distribution. However, this study does not intend to predict from the perspective of various species diversity, so only the sample data of B. platyphylla is considered. It can also fully reflect the niche represented by species distribution data is only a part of the actual ecosystem [65].
Many studies have shown that global warming will lead to the reduction or even total loss of suitable habitat for species [66][67][68]. In contrast, global climate change is predicted to increase the suitable area for B. platyphylla, consistent with the prediction of the potential geographical distribution of Juglans regia L. in China [69] and endangered medicinal plants in Yunnan [70]. This suggests that there will be more suitable areas for B. platyphylla cultivation in the future. B. platyphylla has a very high utilization value for humans, resulting in high market demand. Moreover, B. platyphylla forest is also of great significance to maintain the ecological balance of the forest. In the semi-arid area of the Loess Plateau, it can effectively improve the nutrient fixation capacity [71]. It is also very sensitive to salt stress, thus, breeding new varieties of B. platyphylla with high salt tolerance will help to improve the ecological environment in arid and saline-alkali areas [72]. B. platyphylla plays an important role in regional carbon sequestration. The annual net productivity and annual net carbon sequestration of B. platyphylla forest will increase with the increase of tree age [73]. Natural-based climate change emission reduction strategies have the potential to significantly reduce greenhouse gas emissions [74]. According to a report by the United Nations IPCC, development strategies such as afforestation, reforestation, and improved forest management can have key roles in the global emission reduction portfolio [75,76]. Thus, the findings of this study could inform future governments and agencies about the most suitable area to cultivate birch forests to conserve water sources, beautify the environment, help alleviate global warming, and also bring higher economic benefits. Compared with other studies, where analyses were limited to the effects of climate change on the potential geographical distribution of species in terms of bioclimatic variables [77][78][79], the novelty of this study lies in the comprehensive consideration of climate, soil, and topographic variables of the potential geographical distribution pattern of B. platyphylla. The soil physicochemical properties did not only include a few commonly used soil variables but relied on data from the world soil database. This study also has some limitations. The distribution of a species is not only impacted by climate, soil, and topography. Considering human activities related to land cover, hydrogeological conditions, road distribution, and residential distribution would also improve the accuracy of the simulation of the potential geographical distribution of B. platyphylla. The purpose of this study was to predict the potential geographical distribution of B. platyphylla in China on a large scale. According to the results of this study, small-scale field experiments were carried out in Northeast China, the Qinba Mountains, and the Hengduan Mountains, which are host to a wide distribution of suitable areas for B. platyphylla, to provide more accurate guidance for afforestation projects in China.

Conclusions
B. platyphylla is an important broad-leaved timber species in China with economic and ecological value. Based on species distribution data and environmental variables such as climate, soil, and topography, the current potential geographical distributions of B. platyphylla under different climate scenarios and that in the 2050s and 2070s were predicted. The main environmental variables influencing the geographical distribution were analyzed, and the range and change in suitable areas for B. platyphylla under different climate scenarios were compared. The results show that the suitable area of B. platyphylla in China extends from Xiaoxing'an Mountains in Northeast China to Hengduan Mountains in Southwest China. Under the climate warming scenario, the suitable area of B. platyphylla will further expand. Through artificial cultivation of B. platyphylla forest, we can optimize the structure of forestry development more reasonably and enrich the supply of forest resources. The function of carbon fixation and water conservation of B. platyphylla forest is of great significance for maintaining the ecological balance of forests. At the same time, the forest by-products will also produce considerable economic benefits. Our research will provide more accurate guidance for China to carry out afforestation projects, and also provide the scientific basis for investigation and sustainable utilization of B. platyphylla resources, and provide important references for management and cultivation of B. platyphylla forest.